Sdd/dev by sdevare-nv · Pull Request #1 · sdevare-nv/nv-opencode

sdevare-nv · 2026-06-11T18:30:09Z

Issue for this PR

Closes #

Type of change

Bug fix
New feature
Refactor / code improvement
Documentation

What does this PR do?

Please provide a description of the issue, the changes you made to fix it, and why they work. It is expected that you understand why your changes work and if you do not understand why at least say as much so a maintainer knows how much to value the PR.

If you paste a large clearly AI generated description here your PR may be IGNORED or CLOSED!

How did you verify your code works?

Screenshots / recordings

If this is a UI change, please include a screenshot or recording.

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

If you do not follow this template your PR will be automatically rejected.

The opencode CLI registers TUI subcommands (Attach, TuiThread) via eager imports in index.ts. Loading them transitively imports cli/cmd/tui/app.tsx, whose JSX is meant to compile against @opentui/solid (per the package tsconfig's jsxImportSource) but is incorrectly resolved against react/jsx-dev-runtime when Bun runs the un-bundled .ts file at runtime — react isn't a dep of packages/opencode, so the bench harness crashes at startup before the run command ever executes. The bench harness (packages/opencode/src/bench/cli.ts) only ever spawns \`bun src/index.ts run\` as a subprocess; it never invokes the TUI. Remove the imports + command registrations entirely so the cli/cmd/tui/ subtree is unreachable from the bench code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Running opencode's CLI un-bundled (`bun src/index.ts run`) triggers cascading runtime-resolution failures: tsconfig's `jsxImportSource` isn't honored for inline `.tsx` JIT compilation (we already removed the TUI commands to dodge that), and bun's isolated install layout under `node_modules/.bun/<pkg>@<ver>/...` breaks `..`-relative `.mjs` imports inside packages like @anthropic-ai/sdk (the `internal/to-file.mjs` import from `core/uploads.mjs` fails to traverse the symlink the way Node does at runtime). opencode is meant to be shipped as a pre-bundled single file (their own `bin/opencode` is built the same way) — `bun build` resolves every transitive import statically and inlines the result, so runtime never has to. The companion gym `setup_scripts/opencode.sh` now invokes `bun build packages/opencode/src/index.ts --outdir .bench-build --entry-naming opencode.js`. This commit teaches `bench/cli.ts` to prefer that bundle when it exists, and falls back to `src/index.ts` for dev / when the build step hasn't run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`provider/models.ts:137` does `import("./models-snapshot.js")` for a static cache of models.dev metadata. Upstream generates this file at build time via `script/generate.ts` (network fetch from models.dev), and `.gitignore` correctly excludes it as a build artifact. For the bench harness we register a single custom provider (`@opencode-ai/nemo-gym`) in the per-instance opencode config and never consult the snapshot. But `bun build --target=bun packages/opencode/src/index.ts ...` still needs the import target to exist at static-analysis time, otherwise it errors with: error: Could not resolve: "./models-snapshot.js" Force-add empty stubs so the bundle build succeeds without running `script/generate.ts` (which needs network + adds 5+ seconds). The runtime `try:` lambda in models.ts handles an empty snapshot fine. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…onfig `PermissionActionConfig` accepts a glob-keyed map only for file/shell-style permissions (edit, bash). webfetch / websearch use a different shape (a single literal action), so emitting `{"*": "deny"}` for them trips the config validator at startup: Error: Configuration is invalid at .../opencode.jsonc ↳ Expected PermissionActionConfig | undefined, got {"*":"deny"} agent.swe-bench.permission.webfetch ↳ Expected PermissionActionConfig | undefined, got {"*":"deny"} agent.swe-bench.permission.websearch Both tools are already disabled via the agent's `tools:` block (webfetch: false, websearch: false), so the permission entries are redundant. Removing them lets the per-instance config load cleanly and the run command actually start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

opencode's bundled-provider loader invokes the factory with \`{ name, ...options }\` where `options` is the merged user + custom provider defaults from `provider.ts`. For some providers opencode injects an empty/static `headers: {}` into that merged options dict (distinct from upstream openai-compatible's schema where `headers` is a function `() => Record<string, string | undefined>`). We were unconditionally invoking `this.cfg.headers?.()`. The optional chain only short-circuits on null/undefined, so when opencode passed a plain object it threw at session-start: Error: this.cfg.headers is not a function. (In 'this.cfg.headers?.()', 'this.cfg.headers' is an instance of Object) Branch on `typeof headers` to support both shapes: call it if it's a function, spread it directly if it's a plain object. Either way the resulting kv pairs are merged into the per-request HTTP headers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The constructor merged caller config over defaults via: this.cfg = { requestTimeoutMs: 600_000, retries: 3, ...cfg } opencode's provider loader invokes the factory with options that include explicit `undefined` for optional fields. Our factory in `provider/sdk/nemo-gym/index.ts` faithfully passes those through: new NemoGymLanguageModel(modelId, { ..., requestTimeoutMs: opts.requestTimeoutMs, // undefined retries: opts.retries, // undefined }) The `...cfg` spread then overwrites the 600_000 / 3 defaults with `undefined`. At runtime `setTimeout(fn, undefined)` is treated as `setTimeout(fn, 0)` — so the abort timer fires before `fetch` can even hand off the request. Result: NeMoGym chat completions failed after 3 attempts: AbortError: The operation was aborted. with ~10 ms total elapsed between step_start and the error event. Swap to the spread-first, coalesce-with-`??` pattern so undefineds fall through to the defaults instead of clobbering them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When the input row's responses_create_params.model is unset, the gym side hands opencode an empty model string. opencode's session/provider resolver then falls back to its sentinel `"default"` (or threads the empty string straight through). We were faithfully POSTing \`model: "default"\` to the gym openai_model server, which forwarded it to OpenAI: The model `default` does not exist or you do not have access to it. The gym openai_model server already has the right pattern for this: body_dict.setdefault("model", self.config.openai_model) so omitting `model` from the outbound body lets the server fill in the policy-configured model name (e.g. gpt-4.1-2025-04-14) without any client-side knowledge of what the policy actually points to. Guard the model assignment in `_buildRequestParams` to do exactly that. The companion gym-side change is in OpenCodeHarnessProcessor.get_run_command: when body.model is empty, fall back to the resolved model_server_cfg.openai_model (or .model for vllm) before writing the per-instance config, so opencode also sees a real string when it's populating the agent's model field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The todowrite tool is just an in-session planning aid — explicit multi-step task lists help the model decompose non-trivial fixes without affecting token-ID contiguity or subagent dispatch. Was disabled defensively; flipping it on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Strip git history past base_commit so the agent can't reach future commits via `git checkout <future-sha>` / `git log` exploration. Two-pass: careful per-ref iteration first; nuclear batch-delete fallback for monorepos with thousands of refs (e.g. datadog-agent with 5k+ release tags would time out the careful pass). Runs once in cli.ts before spawning the opencode session, scoped to the per-instance workspace, using the base_commit field from the instance JSONL. `|| true` at the tail so a busted git state can't kill the rollout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Mark each phase + each ref delete/reset with a `[deep_reset:careful]` or `[deep_reset:nuclear]` line so the agent log shows what happened. Mirrors the per-step `echo` markers in nv-OpenHands' run_infer.py deep-reset path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

- language-model.ts: read sessionID/parentSessionID from `x-session-affinity` / `x-parent-session-id` request headers (set by opencode's session/llm.ts for non-opencode providers). Per-session turn counter prevents subagent dumps from clobbering the main session's. Filename includes sessionID so each session's per-turn JSONs sit side-by-side; payload now carries session_id, parent_session_id, and turn for downstream reconstruction of the agent tree. - cli.ts: new --enable-subagents flag; toggles `task: <bool>` in the per-instance opencode.jsonc. Default off. - run_infer.sh: forwards ENABLE_SUBAGENTS env -> --enable-subagents flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Replaces the 12-line bench-local DEFAULT_SYSTEM_PROMPT with the real anthropic system prompt opencode ships (`session/prompt/anthropic.txt`, ~105 lines), plus a short SWE-bench addendum (don't commit/format the diff, harness captures git diff as the patch, don't edit tests). `--system-prompt <path>` override still wins when supplied. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

The env block in session/system.ts (cwd, worktree, platform, today's date) is appended to the system prompt every turn. The `Today's date` field uses `new Date().toDateString()` — across a midnight rollover the system message shifts, which breaks the RL prompt-token-prefix invariant (turn[N+1].prompt_token_ids must extend turn[N]'s). Bench mode sets OPENCODE_DISABLE_ENV_PROMPT=1 in the child env so the environment() effect short-circuits to []. Normal `opencode run` is unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Some apptainer images ENOENT on bare program names through Bun's posix_spawn (libuv falls back to PATH-less execv on some kernels / musl builds). Symptoms in the wild: [bench] deep_reset spawn error: ENOENT posix_spawn 'bash' [bench] wrote ... error=opencode_exit_999 # bun spawn also ENOENT Fixes: - deep_reset.ts: probe /bin/bash, /usr/bin/bash, /bin/sh, /usr/bin/sh and spawn the first that exists. The script is POSIX-compatible so /bin/sh works fine when bash is absent (minimal/distroless SIFs). Skip deep_reset with a warning if no shell is found. - cli.ts runOpencode: use process.execPath (the currently-running bun binary) instead of bare "bun". - cli.ts captureGitDiff: probe /usr/bin/git, /bin/git, /usr/local/bin/git with bare "git" as last-ditch fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

The previous fix moved to absolute binary paths but `/bin/bash` still ENOENTs through Bun's posix_spawn whenever the spawn passes a `cwd` option on minimal apptainer images (e.g., SWE-bench's astropy SIF). Root cause: libc lacks `posix_spawn_file_actions_addchdir_np`, so libuv's cwd-handling fallback fails the spawn outright instead of falling back to fork+chdir+exec. Workaround: don't pass `cwd` to spawn at all. - deep_reset.ts: prepend `cd <workspaceRoot> && ` to the shell script itself (shellQuote handles paths with spaces / special chars). - cli.ts runOpencode: drop `cwd: workspaceRoot` from the bun spawn. Opencode's existing `--dir <workspaceRoot>` flag changes its own working directory, so spawn-level cwd was redundant anyway. - captureGitDiff already uses `git -C <workspaceRoot>` (no cwd). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Plain `git -C <workspaceRoot> diff` only reports changes to tracked files. Every new file the agent creates via the `write` tool stays untracked, so it was silently dropped from the captured patch and the rollout was recorded as `patch_exists=false` / reward 0 even though the agent ran to completion (`reason: "stop"`, bench `exit=0`). In a recent SWE-bench batch (swebench_results_1780596228320_bcd81ba7), 8 of the 21 zero-byte patches were exactly this case — 7/8 had `write` calls to brand-new paths and never staged. Example trajectory: denoland-deno-8408-agentic wrote a 6499-byte std/encoding/csv_stringify.ts from scratch, type-checked, ran a functional test, and stopped. git_patch came back "" because the file was untracked. Fix: mark untracked files as intent-to-add (`git add -AN`) so they appear in `git diff` without being committed. Worktree-style diff is preserved so the SWE-bench evaluator's `git apply` still works. Also pass `--binary` so any binary artifacts the agent produces aren't silently truncated by the textual diff path. The system-prompt addendum already tells the model not to commit or push, which is still correct after this fix — the harness now picks up unstaged new files on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Some dataset SIFs (notably `swe-bench-ext`, and certain SWE-rebench variants) copy the repo into the workspace as a flat source tree with no `.git` directory. Without one: * runDeepReset's first `git rev-parse` fails, the outer `|| true` swallows the error, and deep_reset is a silent no-op. * captureGitDiff runs `git -C <workspace> diff` against a non-repo; git emits `fatal: not a git repository` to stderr and exits with empty stdout, so the function returns "". Net effect: every rollout for those SIFs lands as `patch=0 bytes` regardless of how many files the agent edited. Recent symptom — in swebench_results_1781123430109_686771d1 (swe-bench-ext, 37 instances), every patch was zero bytes despite trajectories with up to 30 successful `edit` calls on the workspace. `cd /workspace/repo && git status` in the same SIF returns `fatal: not a git repository`. Port of nv-OpenHands' run_infer.py:1142-1156 swe-bench-ext baseline: detect the missing `.git`, init a local repo, snapshot the pristine tree as `opencode_bench_baseline`, and skip deep_reset (the upstream base_commit SHA doesn't exist in the fresh repo, so deep_reset would just fall through to its nuclear pass and noisily no-op anyway). This composes with 52efa89 (`git add -AN && git diff --binary`) — the prior commit handled `.git`-present-but-untracked-write-files; this commit handles `.git`-missing-entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

Falling off the end of main() with no explicit process.exit() left Bun to drain the event loop on its own, which produced a flaky exit=1 even on successful runs that wrote a valid output.jsonl with `error=none`. The residual handles seem to come from the spawned opencode subprocess (piped stdio that hasn't been explicitly closed) and from opencode's sqlite migration code — both stay registered long enough that Bun treats the runtime drain as a non-clean exit. Symptom in DeepSeek-V4-Flash_opencode_0-0-12711152.out (SWE-rebench-V2 train shard): three instances logged [bench] wrote .../bench_run/output.jsonl (patch=N bytes, error=none) with N=2001, 9046, ... — `error=none` means `result.exitCode === 0`, so opencode itself exited cleanly and the patch was captured. But apptainer exited 1, gym's runner raised `RuntimeError("Command failed with return code 1")`, and the rollout was discarded as a failure even though the bench succeeded. Fix: explicitly `process.exit(result.exitCode === 0 ? 0 : 1)` after the [bench] wrote log line. Deterministic, mirrors opencode's exit code, and doesn't depend on the event-loop drain heuristic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Sugam Devare <sdevare@nvidia.com>

sdevare-nv and others added 21 commits May 6, 2026 15:58

init

b6089bf

bench: add nemo-gym provider + run_infer entry

1a1bf7a

feat: change nudge to user message

b9602b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sdd/dev#1

Sdd/dev#1
sdevare-nv wants to merge 21 commits into
devfrom
sdd/dev

sdevare-nv commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sdevare-nv commented Jun 11, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Screenshots / recordings

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant